Improved Pattern-Driven Algorithms for Motif Finding in DNA Sequences

نویسندگان

  • Sing-Hoi Sze
  • Xiaoyan Zhao
چکیده

In order to guarantee that the optimal motif is found, traditional pattern-driven approaches perform an exhaustive search over all candidate motifs of length l. We develop an improved pattern-driven algorithm that takes O(4lk) time, where k is the number of sequences in the sample and l is the motif length, which is independent of the length of each sequence n for large enough l and saving a factor of n in time complexity over the original pattern-driven approach. We further extend this strategy to allow arbitrary don’t care positions within a motif without much decrease in solvable values of l. Testing this algorithm on a large set of yeast samples constructed from co-expressed gene clusters reveals that most biological motifs have many invariant or almost invariant positions and these positions can be used to define the motif while ignoring the other positions. This motivates the following two-stage strategy that extends the solvable values of l substantially for the pattern-driven approach: first use an O(2lkn) algorithm to exhaustively search over all candidate motifs allowing arbitrary don’t care positions but disallowing mismatches, then refine these motifs by allowing a limited amount of flexibility to model the almost invariant positions. We demonstrate that this seemingly restrictive motif definition is sufficiently powerful by showing that the performance of this algorithm is comparable to the best existing motif finding algorithms on a large benchmark set of samples. A software program implementing these approaches (MotifEnumerator) is available at http://faculty.cs.tamu.edu/shsze/motifenumerator.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Development of an Efficient Hybrid Method for Motif Discovery in DNA Sequences

This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed b...

متن کامل

HIGEDA: a hierarchical gene-set genetics based algorithm for finding subtle motifs in biological sequences

MOTIVATION Identification of motifs in biological sequences is a challenging problem because such motifs are often short, degenerate, and may contain gaps. Most algorithms that have been developed for motif-finding use the expectation-maximization (EM) algorithm iteratively. Although EM algorithms can converge quickly, they depend strongly on initialization parameters and can converge to local ...

متن کامل

A Review: Applying Genetic Algorithms for Motif Discovery

This paper explores & reviews the use of genetic algorithms by various researchers as a solution to discover motifs in molecular sequences. This survey talks about the general GA based procedure for motif discovery & reviews the latest developments in DNA motif finding using Genetic algorithms. Although GA approach has not been applied extensively by researchers as compared to other computation...

متن کامل

U Subtle motifs: defining the limits of motif finding algorithms

MOTIVATION What constitutes a subtle motif? Intuitively, it is a motif that is almost indistinguishable, in the statistical sense, from random motifs. This question has important practical consequences: consider, for example, a biologist that is generating a sample of upstream regulatory sequences with the goal of finding a regulatory pattern that is shared by these sequences. If the sequences ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005